Goto

Collaborating Authors

 control group


AI Digital Twins Are Helping People Manage Diabetes and Obesity

WIRED

As patients and employers look for alternatives to pricey GLP-1 drugs, Silicon Valley startup Twin Health is using AI and wearable sensors to help people make healthier choices. Rodney Buckley has lost 100 pounds in less than a year, not by using a GLP-1 drug but with the help of a digital twin. Last March, the 55-year-old retired firefighter turned village mayor of Third Lake, Illinois, was 376 pounds. He had tried different diets over the years and would typically lose some weight but eventually gain it back. When his wife's employer started offering a program from startup Twin Health, he thought he would give it a try.




Robust X-Learner: Breaking the Curse of Imbalance and Heavy Tails via Robust Cross-Imputation

Uehara, Eichi

arXiv.org Machine Learning

Estimating Heterogeneous Treatment Effects (HTE) in industrial applications such as AdTech and healthcare presents a dual challenge: extreme class imbalance and heavy-tailed outcome distributions. While the X-Learner framework effectively addresses imbalance through cross-imputation, we demonstrate that it is fundamentally vulnerable to "Outlier Smearing" when reliant on Mean Squared Error (MSE) minimization. In this failure mode, the bias from a few extreme observations ("whales") in the minority group is propagated to the entire majority group during the imputation step, corrupting the estimated treatment effect structure. To resolve this, we propose the Robust X-Learner (RX-Learner). This framework integrates a redescending γ-divergence objective -- structurally equivalent to the Welsch loss under Gaussian assumptions -- into the gradient boosting machinery. We further stabilize the non-convex optimization using a Proxy Hessian strategy grounded in Majorization-Minimization (MM) principles. Empirical evaluation on a semi-synthetic Criteo Uplift dataset demonstrates that the RX-Learner reduces the Precision in Estimation of Heterogeneous Effect (PEHE) metric by 98.6% compared to the standard X-Learner, effectively decoupling the stable "Core" population from the volatile "Periphery".


SCOP: Scientific Control for Reliable Neural Network Pruning

Neural Information Processing Systems

This paper proposes a reliable neural network pruning algorithm by setting up a scientific control. Existing pruning methods have developed various hypotheses to approximate the importance of filters to the network and then execute filter pruning accordingly. To increase the reliability of the results, we prefer to have a more rigorous research design by including a scientific control group as an essential part to minimize the effect of all factors except the association between the filter and expected network output. Acting as a control group, knockoff feature is generated to mimic the feature map produced by the network filter, but they are conditionally independent of the example label given the real feature map. We theoretically suggest that the knockoff condition can be approximately preserved given the information propagation of network layers. Besides the real feature map on an intermediate layer, the corresponding knockoff feature is brought in as another auxiliary input signal for the subsequent layers. Redundant filters can be discovered in the adversarial process of different features. Through experiments, we demonstrate the superiority of the proposed algorithm over state-of-the-art methods. For example, our method can reduce 57.8% parameters and 60.2% FLOPs of ResNet-101 with only 0.01% top-1 accuracy loss on ImageNet.


Bridging Online Behavior and Clinical Insight: A Longitudinal LLM-based Study of Suicidality on YouTube Reveals Novel Digital Markers

Sobol, Ilanit, Lissak, Shir, Tikochinski, Refael, Nakash, Tal, Klomek, Anat Brunstein, Fruchter, Eyal, Reichart, Roi

arXiv.org Artificial Intelligence

Suicide remains a leading cause of death in Western countries. As social media becomes central to daily life, digital footprints offer valuable insight into suicidal behavior. Focusing on individuals who attempted suicide while uploading videos to their channels, we investigate: How do linguistic patterns on YouTube reflect suicidal behavior, and how do these patterns align with or differ from expert knowledge? We examined linguistic changes around suicide attempts and compared individuals who attempted suicide while actively uploading to their channel with three control groups: those with prior attempts, those experiencing major life events, and matched individuals from the broader cohort. Applying complementary bottom-up, hybrid, and expert-driven approaches, we analyzed a novel longitudinal dataset of 181 suicide-attempt channels and 134 controls. In the bottom-up analysis, LLM-based topic-modeling identified 166 topics; five were linked to suicide attempts, two also showed attempt-related temporal changes (Mental Health Struggles, $OR = 1.74$; YouTube Engagement, $OR = 1.67$; $p < .01$). In the hybrid approach, clinical experts reviewed LLM-derived topics and flagged 19 as suicide-related. However, none showed significant effects beyond those identified bottom-up. YouTube Engagement, a platform-specific indicator, was not flagged, underscoring the value of bottom-up discovery. A top-down psychological assessment of suicide narratives revealed differing motivations: individuals describing prior attempts aimed to help others ($β=-1.69$, $p<.01$), whereas those attempted during the uploading period emphasized personal recovery ($β=1.08$, $p<.01$). By integrating these approaches, we offer a nuanced understanding of suicidality, bridging digital behavior and clinical insights.


Reranking partisan animosity in algorithmic social media feeds alters affective polarization Science

Science

We recruited participants through two online platforms, CloudResearch and Bovitz, targeting US residents over 18 years old who self-identified as either Republican or Democrat and were active users of X (SM section S1.1). Qualified individuals were invited to complete a screening task, which included installing a browser extension that analyzed their X feed. To ensure the interventions could have a meaningful impact on participants' feeds, only those with at least 5% of posts related to politics or social issues were invited to participate. Figure S1 summarizes the recruitment funnel, including the number of individuals at each stage of the process. Participants were not instructed to use X in any particular way, but they received daily reminders if they had not used the platform that day.


MindSET: Advancing Mental Health Benchmarking through Large-Scale Social Media Data

Mankarious, Saad, Zirikly, Ayah, Wiechmann, Daniel, Kerz, Elma, Kempa, Edward, Qiao, Yu

arXiv.org Artificial Intelligence

Social media data has become a vital resource for studying mental health, offering real-time insights into thoughts, emotions, and behaviors that traditional methods often miss. Progress in this area has been facilitated by benchmark datasets for mental health analysis; however, most existing benchmarks have become outdated due to limited data availability, inadequate cleaning, and the inherently diverse nature of social media content (e.g., multilingual and harmful material). We present a new benchmark dataset, \textbf{MindSET}, curated from Reddit using self-reported diagnoses to address these limitations. The annotated dataset contains over \textbf{13M} annotated posts across seven mental health conditions, more than twice the size of previous benchmarks. To ensure data quality, we applied rigorous preprocessing steps, including language filtering, and removal of Not Safe for Work (NSFW) and duplicate content. We further performed a linguistic analysis using LIWC to examine psychological term frequencies across the eight groups represented in the dataset. To demonstrate the dataset utility, we conducted binary classification experiments for diagnosis detection using both fine-tuned language models and Bag-of-Words (BoW) features. Models trained on MindSET consistently outperformed those trained on previous benchmarks, achieving up to an \textbf{18-point} improvement in F1 for Autism detection. Overall, MindSET provides a robust foundation for researchers exploring the intersection of social media and mental health, supporting both early risk detection and deeper analysis of emerging psychological trends.


Barriers to AI Adoption: Image Concerns at Work

Almog, David

arXiv.org Artificial Intelligence

Concerns about how workers are perceived can deter effective collaboration with artificial intelligence (AI). In a field experiment on a large online labor market, I hired 450 U.S.-based remote workers to complete an image-categorization job assisted by AI recommendations. Workers were incentivized by the prospect of a contract extension based on an HR evaluator's feedback. I find that workers adopt AI recommendations at lower rates when their reliance on AI is visible to the evaluator, resulting in a measurable decline in task performance. The effects are present despite a conservative design in which workers know that the evaluator is explicitly instructed to assess expected accuracy on the same AI-assisted task. This reduction in AI reliance persists even when the evaluator is reassured about workers' strong performance history on the platform, underscoring how difficult these concerns are to alleviate. Leveraging the platform's public feedback feature, I introduce a novel incentive-compatible elicitation method showing that workers fear heavy reliance on AI signals a lack of confidence in their own judgment, a trait they view as essential when collaborating with AI.


Learning to Call: A Field Trial of a Collaborative Bandit Algorithm for Improved Message Delivery in Mobile Maternal Health

Dasgupta, Arpan, Maniyar, Mizhaan, Srivastava, Awadhesh, Kumar, Sanat, Mahale, Amrita, Hegde, Aparna, Suggala, Arun, Shanmugam, Karthikeyan, Taneja, Aparna, Tambe, Milind

arXiv.org Artificial Intelligence

Mobile health (mHealth) programs utilize automated voice messages to deliver health information, particularly targeting underserved communities, demonstrating the effectiveness of using mobile technology to disseminate crucial health information to these populations, improving health outcomes through increased awareness and behavioral change. India's Kilkari program delivers vital maternal health information via weekly voice calls to millions of mothers. However, the current random call scheduling often results in missed calls and reduced message delivery. This study presents a field trial of a collaborative bandit algorithm designed to optimize call timing by learning individual mothers' preferred call times. We deployed the algorithm with around $6500$ Kilkari participants as a pilot study, comparing its performance to the baseline random calling approach. Our results demonstrate a statistically significant improvement in call pick-up rates with the bandit algorithm, indicating its potential to enhance message delivery and impact millions of mothers across India. This research highlights the efficacy of personalized scheduling in mobile health interventions and underscores the potential of machine learning to improve maternal health outreach at scale.